Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Use a binary tree for large filters #152

Closed
wants to merge 5 commits into from

Conversation

drakenclimber
Copy link
Member

I still need to add doxygen comments in gen_pfc.c. It also looks like I need to update the copyright headers in a few spots with 2019.

@coveralls
Copy link

coveralls commented Apr 18, 2019

Coverage Status

Coverage increased (+1.02%) to 90.977% when pulling e8bea8e on drakenclimber:rfc-bintree-v2 into 5432e15 on seccomp:master.

@drakenclimber
Copy link
Member Author

@tyhicks, @tych0, @jdstrand or others - would you have time to review this? I would like to get this into the v2.5 libseccomp release if at all possible.

Also - @jdstrand and @pcmoore (and others, of course), do you have thoughts on the interface to enable/disable the binary tree? (Again, per this issue #116, it sounds like that would be nice to have.) Perhaps we could use an scmp_filter_attr for disabling it?

@pcmoore
Copy link
Member

pcmoore commented Oct 7, 2019

/me still needs to look at this

However, in regards to the question of enabling/disabling the binary tree, I'm going to stick with my earlier comments from #116 that a filter attribute is probably the best option, although I'm open to other ideas if anyone has some they want to share. Maybe SCMP_FLTATR_CTL_OPTIMIZE and have the current default take a value of '1' and the binary tree approach a value of '2' (leave '0' as a special value for now)?

@drakenclimber
Copy link
Member Author

I'm going to stick with my earlier comments from #116 that a filter attribute is probably the best option

Thanks. I had it in my head that a filter attribute was the best way to go about this, but I totally forgot that you had originally recommended it. The joys of getting old ;)

@drakenclimber
Copy link
Member Author

drakenclimber commented Oct 9, 2019

I made two major changes:

  • Rebased on top of the latest master
  • Added a filter attribute to enable/disable the binary tree

Also, the tests are getting rather lengthy to run. Should we consider adding a --fast or --full option to the test suite?

Prior to these changes:

### python tests were disabled for this run
$ time make check
...
Regression Test Summary
 tests run: 8030
 tests skipped: 42
 tests passed: 8030
 tests failed: 0
 tests errored: 0
============================================================
PASS: regression
=============
1 test passed
=============
Making check in doc

real    5m53.913s
user    2m28.860s
sys     2m55.914s

And now with the binary tree tests added in:

### python tests were disabled for this run
$ time make check
...
Regression Test Summary
 tests run: 8362
 tests skipped: 43
 tests passed: 8362
 tests failed: 0
 tests errored: 0
============================================================
PASS: regression
=============
1 test passed
=============
Making check in doc

real    7m21.696s
user    4m9.202s
sys     3m9.842s

src/gen_pfc.c Show resolved Hide resolved
include/seccomp.h.in Outdated Show resolved Hide resolved
include/seccomp.h.in Outdated Show resolved Hide resolved
src/db.c Show resolved Hide resolved
src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Outdated
i++;
}

return i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm missing some finer point, but can't you simply divide by two here(return syscall_cnt / 2;)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I'm not sure. I remember this function was trickier to write than I expected, and I grudgingly settled on this as it seemed the best balance of readability and speed. I'll look through it again.

src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Show resolved Hide resolved
@drakenclimber drakenclimber force-pushed the rfc-bintree-v2 branch 4 times, most recently from db0c6ab to 9919b54 Compare November 13, 2019 22:43
src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Outdated Show resolved Hide resolved
src/gen_bpf.c Outdated

for (i = 0; i < bintree_levels; i++) {
bintree_syscalls[i] = INVALID_SYSCALL;
bintree_hashes[i] = INVALID_HSH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that _gen_bpf_arch() is already a bit unwieldy but I wonder if we can move some of the binary tree creation into another function, much like you did with the sort?

src/gen_bpf.c Outdated
*/
bintree_hashes[j] = b_bintree->hash;
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That entire "build the binary tree" for loop seems like a good candidate for moving into its own function, yes/no?

Copy link
Member

@pcmoore pcmoore Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at the code below, I'm now thinking that maybe the entire existing "create the syscall filters and add them to block list group" should be split out into separate optimization specific functions like you did with sort.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - _gen_bpf_arch is way too monolithic now.

I'll see what I can do to divide it into some smaller more sane functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to split _gen_bpf_arch() up into a few smaller functions. I think the division makes sense, but staring at that code too long may have warped my point of view ;)

@drakenclimber drakenclimber force-pushed the rfc-bintree-v2 branch 2 times, most recently from 2e7089f to 62223dc Compare November 15, 2019 20:18
@pcmoore pcmoore added this to the v2.5 milestone Feb 23, 2020
@pcmoore
Copy link
Member

pcmoore commented Feb 23, 2020

@drakenclimber do you have a revision of these patches that haven't been pushed to your PR branch? I'm asking because I see a few comments which are marked as resolved, but the code that can be reviewed in the GH web UI doesn't reflect the resolution ... ?

@pcmoore
Copy link
Member

pcmoore commented Feb 24, 2020

Removing this PR from the v2.5 milestone because we are already tracking this RFE for v2.5 in #116.

@pcmoore pcmoore removed this from the v2.5 milestone Feb 24, 2020
@drakenclimber
Copy link
Member Author

@drakenclimber do you have a revision of these patches that haven't been pushed to your PR branch? I'm asking because I see a few comments which are marked as resolved, but the code that can be reviewed in the GH web UI doesn't reflect the resolution ... ?

Strange. All the changes I mentioned above should be pushed to this branch. I'll rebase it on top of the current master and see if that appeases - or confuses further :) - the github GUI.

In _gen_bpf_arch(), there was an identical block of code to sort
the primary database syscalls and the secondary database
syscalls.  This commit refactors those duplicated, inline loops
into a single function.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
This commit splits out some init code and a lengthy
for-loop in _gen_bpf_arch() into its own function -
_gen_bpf_syscalls().

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
This commit adds a function - _gen_bpf_insert() - that
inserts an instruction block into the BPF state and
creates the linked list connections for that newly-inserted
block.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
@drakenclimber
Copy link
Member Author

@drakenclimber do you have a revision of these patches that haven't been pushed to your PR branch? I'm asking because I see a few comments which are marked as resolved, but the code that can be reviewed in the GH web UI doesn't reflect the resolution ... ?

Okay, I figured out what I did. I put all of the changes into a temporary branch because I wasn't sure how many more comments were going to come in. My vision was that I was going to then collate them into one updated commit rather than spamming github with a million tiny changes. But the collated commit didn't get pushed here. I'll make sure it's cleaned up and then push it here. Thanks.

This patch adds a filter attribute, SCMP_FLTATR_CTL_OPTIMIZE,
to specify the optimization level of the seccomp filter:
	0 - currently unused
	1 - rules weighted by priority and complexity (default)
	2 - binary tree sorted by syscall number

Several in-house customers have identified that their large
seccomp filters are slowing down their applications.  Their
filters largely consist of simple allow/deny logic for many
syscalls (306 in one case) and for the most part don't utilize
argument filtering.

I modified gen_bpf.c and gen_pfc.c to utilize a cBPF binary tree
if the user has requested optimize level 2.  I then timed
calling getppid() in a loop using one of my customer's seccomp
filters.  I ran this loop one million times and recorded the min,
max, and mean times (in TSC ticks) to call getppid().  (I didn't
disable interrupts, so the max time was often large.)  I chose
to report the minimum time because I feel it best represents the
actual time to traverse the syscall.

Test Case                      minimum TSC ticks to make syscall
----------------------------------------------------------------
seccomp disabled                                             138
getppid() at the front of 306-syscall seccomp filter         256
getppid() in middle of 306-syscall seccomp filter            516
getppid() at the end of the 306-syscall filter              1942
getppid() in a binary tree                                   312

As shown in the table above, a binary tree can signficantly improve
syscall performance in the average and worst case scenario for these
customers.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
This commit adds tests to ensure the validity of the
binary tree and the resultant pfc and bpf output.

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
@drakenclimber
Copy link
Member Author

@drakenclimber do you have a revision of these patches that haven't been pushed to your PR branch? I'm asking because I see a few comments which are marked as resolved, but the code that can be reviewed in the GH web UI doesn't reflect the resolution ... ?

Okay, I figured out what I did. I put all of the changes into a temporary branch because I wasn't sure how many more comments were going to come in. My vision was that I was going to then collate them into one updated commit rather than spamming github with a million tiny changes. But the collated commit didn't get pushed here. I'll make sure it's cleaned up and then push it here. Thanks.

Should be good to go now. My apologies.

@pcmoore
Copy link
Member

pcmoore commented Feb 26, 2020

No worries, it happens. Thanks for the updated PR.

@pcmoore
Copy link
Member

pcmoore commented Feb 28, 2020

Tsk tsk @drakenclimber, ./tools/check-syntax is failing parts of these patches ;)

No worries, it's minor and easily fixed while merging.

@pcmoore
Copy link
Member

pcmoore commented Feb 28, 2020

This looks good, thanks @drakenclimber.

Assuming this make check passes on my laptop I'll push this to master shortly.

@pcmoore
Copy link
Member

pcmoore commented Feb 28, 2020

Everything passed, this is now in the master branch with HEAD 38f04da.

@pcmoore pcmoore closed this Feb 28, 2020
@drakenclimber
Copy link
Member Author

Awesome!!! Thanks, @pcmoore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants